At work today, Matt noted that he found Digg’s algorithm far more interesting than Google’s. I was shocked – after all, Digg isn’t nearly as complex or widely used as Google, but with its rising popularity in the tech space, I could, at least, empathize with why he might feel that way. I also took it as a challenge to expose all the possible elements that might be in an algorithm at Digg, Reddit, Netscape, Shoutwire or other social-news-voting sites. Let’s see how I do:
BTW – I’m going to use a lot of Digg-specific terminology, despite the fact that I’m referring to all of the sites above.
- Number of votes over time
- Uses a floating target based on relative levels of popularity (as mentioned in timing below)
- Any number of votes in a very short period (if not manipulative) is stronger than the same number of votes over a longer period.
- Domain of link
- Has it previously had content submitted? If so, did that content receive votes, get marked as spam/lame, make the front page, etc?
- Has the domain been manually/automatically flagged for being manipulative
- Profile of submitter
- Have they submitted high quality stories in the past?
- Have they submitted spam/lame stories in the past?
- How many friends do they have? This could make it harder or easier to get a story Dugg (harder if they have thousands of friends, but possibly easier if they have at least a few)
- How many submissions have they made? What is their success rate?
- How long has the member been around? New registrants could be a clear sign of spam
- Profiles of voters (as above)
- Timing of submission
- If a low number of stories have recently made the front page in a given sector or overall, the story is more likely to get on top with fewer votes
- If a high number of recent submissions, the opposite may be true
- Time of day – if 50 people all tag a site at 3:00am, that might be a red flag
- Similarity to other links (duplicate)
- Source of votes
- From the same IP address or IP block
- From the same geographic region (that’s not a hotspot for Digg users)
- From the same group as has voted on previous content from a domain or string of domains
- From a group of users who aren’t regular participants/voters
- Manual review as it hits the homepage
- Many Digg users may not realize it, but all stories to hit the frontpage get a manual, editorial review that may pull the story. This often happens with content the editors feel is marketing-focused, driven by marketing dollars or has a marketing agenda.
- Reddit does this, too, but it’s not instantaneous
- Netscape used to do it, but some have speculated the the level of oversight fluctuates
- As a quick example, Brian Clark (of Copyblogger) had this post hit Digg’s homepage last week for a scant minute or so before the editors pulled it.
- Number of comments
- Potentially could be used to detect patterns, though I’ve seen a lot of Dugg stories that had very few comments, so this might not be a great signal
- Number of views
- An abnormally high ratio of views with few Diggs could mean that people aren’t fans of the content
- In my opinion, this is a low signal, and down votes or lame/spam would earn more weight in bringing down a story
- Down votes
- Although Digg doesn’t specifically have them, Reddit does and surely uses them as an influential factor
- Digg, Netscape and Shoutwire all use flag systems which could be similarly interpreted
- Source of Votes
- I suspect that Digg would follow how users normally reach pages (through friends, via direct links, via email/type-in, etc.)
- If an abnormally high number of folks came via an uncommon method to a Digg page (for example, with no referring URL, possibly signifying a mass email or IM link), Digg might want to discount the value of those votes
In a wonderful irony, the Digg website appears to have crashed tonight (a likely cause could be the new re-design, which Neil details at SELand).
So, what do you think? Are there other elements you’d consider having in your own social media voting site? Any obvious ones I neglected to mention?